Mobile
Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
Duan, Yicheng, Huang, Xi, Chen, Duo
The rapid growth of video content demands efficient and precise retrieval systems. While vision-language models (VLMs) excel in representation learning, they often struggle with adaptive, time-sensitive video retrieval. This paper introduces a novel framework that combines vector similarity search with graph-based data structures. By leveraging VLM embeddings for initial retrieval and modeling contextual relationships among video segments, our approach enables adaptive query refinement and improves retrieval accuracy. Experiments demonstrate its precision, scalability, and robustness, offering an effective solution for interactive video retrieval in dynamic environments.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
- North America > United States > California (0.04)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
Abdallah, Abdelrahman, Mozafari, Jamshid, Piryani, Bhawna, Jatowt, Adam
Retrieval-Augmented Generation (RAG) models have drawn considerable attention in modern open-domain question answering. The effectiveness of RAG depends on the quality of the top retrieved documents. However, conventional retrieval methods sometimes fail to rank the most relevant documents at the top. In this paper, we introduce ASRank, a new re-ranking method based on scoring retrieved documents using zero-shot answer scent which relies on a pre-trained large language model to compute the likelihood of the document-derived answers aligning with the answer scent. Our approach demonstrates marked improvements across several datasets, including NQ, TriviaQA, WebQA, ArchivalQA, HotpotQA, and Entity Questions. Notably, ASRank increases Top-1 retrieval accuracy on NQ from $19.2\%$ to $46.5\%$ for MSS and $22.1\%$ to $47.3\%$ for BM25. It also shows strong retrieval performance on several datasets compared to state-of-the-art methods (47.3 Top-1 by ASRank vs 35.4 by UPR by BM25).
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > Arizona (0.04)
- (11 more...)
- Research Report > Promising Solution (0.47)
- Research Report > New Finding (0.46)
- Media (1.00)
- Leisure & Entertainment > Sports (0.67)
- Energy > Power Industry > Utilities > Nuclear (0.67)
Tim Cook reveals his surprising first job - as the Apple CEO says he has been working since he was just 11
He is best known for being CEO of one of the world's largest companies. But before Tim Cook took the reins at Apple, he started his career in a very surprising place. Speaking on the Table Manners podcast, Mr Cook revealed that he started working when he was just 11 years old. He says: 'A lot of [his upbringing] was centred on work and the belief that hard work was essential for everybody, regardless of your age. 'And so I started working when I was probably 11 or 12 on the paper route.'
- North America > United States > Alabama > Mobile County > Mobile (0.05)
- Asia > China (0.05)
- North America > United States > California > Santa Clara County > Cupertino (0.05)
- North America > United States > California > San Bernardino County > San Bernardino (0.05)
- Leisure & Entertainment (0.96)
- Health & Medicine (0.73)
- Media > Music (0.70)
- Information Technology > Communications > Mobile (0.94)
- Information Technology > Artificial Intelligence (0.71)
Exploring Large Language Models for Climate Forecasting
With the increasing impacts of climate change, there is a growing demand for accessible tools that can provide reliable future climate information to support planning, finance, and other decision-making applications. Large language models (LLMs), such as GPT-4, present a promising approach to bridging the gap between complex climate data and the general public, offering a way for non-specialist users to obtain essential climate insights through natural language interaction. However, an essential challenge remains under-explored: evaluating the ability of LLMs to provide accurate and reliable future climate predictions, which is crucial for applications that rely on anticipating climate trends. In this study, we investigate the capability of GPT-4 in predicting rainfall at short-term (15-day) and long-term (12-month) scales. We designed a series of experiments to assess GPT's performance under different conditions, including scenarios with and without expert data inputs. Our results indicate that GPT, when operating independently, tends to generate conservative forecasts, often reverting to historical averages in the absence of clear trend signals. This study highlights both the potential and challenges of applying LLMs for future climate predictions, providing insights into their integration with climate-related applications and suggesting directions for enhancing their predictive capabilities in the field.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
- North America > United States > Florida > Escambia County > Pensacola (0.04)
- (12 more...)
Multi-environment Topic Models
Sobhani, Dominic, Feder, Amir, Blei, David
Probabilistic topic models are a powerful tool for extracting latent themes from large text datasets. In many text datasets, we also observe per-document covariates (e.g., source, style, political affiliation) that act as environments that modulate a "global" (environment-agnostic) topic representation. Accurately learning these representations is important for prediction on new documents in unseen environments and for estimating the causal effect of topics on real-world outcomes. To this end, we introduce the Multi-environment Topic Model (MTM), an unsupervised probabilistic model that separates global and environment-specific terms. Through experimentation on various political content, from ads to tweets and speeches, we show that the MTM produces interpretable global topics with distinct environment-specific words. On multi-environment data, the MTM outperforms strong baselines in and out-of-distribution. It also enables the discovery of accurate causal effects.
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Iraq (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
Luo, Linhao, Zhao, Zicheng, Gong, Chen, Haffari, Gholamreza, Pan, Shirui
Large language models (LLMs) have demonstrated impressive reasoning abilities, but they still struggle with faithful reasoning due to knowledge gaps and hallucinations. To address these issues, knowledge graphs (KGs) have been utilized to enhance LLM reasoning through their structured knowledge. However, existing KG-enhanced methods, either retrieval-based or agent-based, encounter difficulties in accurately retrieving knowledge and efficiently traversing KGs at scale. In this work, we introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs. To eliminate hallucinations, GCR ensures faithful KG-grounded reasoning by integrating KG structure into the LLM decoding process through KG-Trie, a trie-based index that encodes KG reasoning paths. KG-Trie constrains the decoding process, allowing LLMs to directly reason on graphs and generate faithful reasoning paths grounded in KGs. Extensive experiments on several KGQA benchmarks demonstrate that GCR achieves state-of-the-art performance and exhibits strong zero-shot generalizability to unseen KGs without additional training. Code is available at https://github.com/RManLuo/ Large language models (LLMs) have shown impressive reasoning abilities in handling complex tasks (Qiao et al., 2023; Huang & Chang, 2023), marking a significant leap that bridges the gap between human and machine intelligence. These issues result in factual errors and flawed reasoning processes (Nguyen et al., 2024), which greatly undermine the reliability of LLMs in real-world applications. To address these issues, many studies utilize knowledge graphs (KGs), which encapsulate extensive factual information in a structured format, to improve the reasoning abilities of LLMs (Pan et al., 2024; Luo et al., 2024). Nevertheless, because of the unstructured nature of LLMs, directly applying them to reason on KGs is challenging. Existing KG-enhanced LLM reasoning methods can be roughly categorized into two groups: retrieval-based and agent-based paradigms, as shown in Figure 2 (a) and (b).
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Recall: Empowering Multimodal Embedding for Edge Devices
Cai, Dongqi, Wang, Shangguang, Peng, Chen, Zhang, Zeling, Xu, Mengwei
Human memory is inherently prone to forgetting. To address this, multimodal embedding models have been introduced, which transform diverse real-world data into a unified embedding space. These embeddings can be retrieved efficiently, aiding mobile users in recalling past information. However, as model complexity grows, so do its resource demands, leading to reduced throughput and heavy computational requirements that limit mobile device implementation. In this paper, we introduce RECALL, a novel on-device multimodal embedding system optimized for resource-limited mobile environments. RECALL achieves high-throughput, accurate retrieval by generating coarse-grained embeddings and leveraging query-based filtering for refined retrieval. Experimental results demonstrate that RECALL delivers high-quality embeddings with superior throughput, all while operating unobtrusively with minimal memory and energy consumption.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
- Information Technology > Services (0.68)
- Information Technology > Security & Privacy (0.67)
MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation
Yuan, Chunyu, Zhao, Dongfang, Agaian, Sos S.
Skin lesion segmentation is key for early skin cancer detection. Challenges in automatic segmentation from dermoscopic images include variations in color, texture, and artifacts of indistinct lesion boundaries. Deep learning methods like CNNs and U-Net have shown promise in addressing these issues. To further aid early diagnosis, especially on mobile devices with limited computing power, we present MUCM-Net. This efficient model combines Mamba State-Space Models with our UCM-Net architecture for improved feature learning and segmentation. MUCM-Net's Mamba-UCM Layer is optimized for mobile deployment, offering high accuracy with low computational needs. Tested on ISIC datasets, it outperforms other methods in accuracy and computational efficiency, making it a scalable tool for early detection in settings with limited resources. Our MUCM-Net source code is available for research and collaboration, supporting advances in mobile health diagnostics and the fight against skin cancer. In order to facilitate accessibility and further research in the field, the MUCM-Net source code is https://github.com/chunyuyuan/MUCM-Net
- North America > United States > New York > Richmond County > New York City (0.04)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.59)
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning
Kim, Gyudong, Ghasemi, Mehdi, Heidari, Soroush, Kim, Seungryong, Kim, Young Geun, Vrudhula, Sarma, Wu, Carole-Jean
Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. In FL, participating user-end devices are highly fragmented in terms of hardware and software configurations. Such fragmentation introduces a new type of data heterogeneity in FL, namely \textit{system-induced data heterogeneity}, as each device generates distinct data depending on its hardware and software configurations. In this paper, we first characterize the impact of system-induced data heterogeneity on FL model performance. We collect a dataset using heterogeneous devices with variations across vendors and performance tiers. By using this dataset, we demonstrate that \textit{system-induced data heterogeneity} negatively impacts accuracy, and deteriorates fairness and domain generalization problems in FL. To address these challenges, we propose HeteroSwitch, which adaptively adopts generalization techniques (i.e., ISP transformation and SWAD) depending on the level of bias caused by varying HW and SW configurations. In our evaluation with a realistic FL dataset (FLAIR), HeteroSwitch reduces the variance of averaged precision by 6.3\% across device types.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > United States > Arizona (0.04)
- North America > United States > Alabama > Mobile County > Mobile (0.04)
- Information Technology (0.68)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)
- Health & Medicine > Diagnostic Medicine (0.46)
CityTFT: Temporal Fusion Transformer for Urban Building Energy Modeling
Dai, Ting-Yu, Niyogi, Dev, Nagy, Zoltan
Urban Building Energy Modeling (UBEM) is an emerging method to investigate urban design and energy systems against the increasing energy demand at urban and neighborhood levels. However, current UBEM methods are mostly physic-based and time-consuming in multiple climate change scenarios. This work proposes CityTFT, a data-driven UBEM framework, to accurately model the energy demands in urban environments. With the empowerment of the underlying TFT framework and an augmented loss function, CityTFT could predict heating and cooling triggers in unseen climate dynamics with an F1 score of 99.98 \% while RMSE of loads of 13.57 kWh.
- North America > United States > Texas > Travis County > Austin (0.15)
- South America > Colombia > Bogotá D.C. > Bogotá (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (9 more...)
- Energy (1.00)
- Construction & Engineering > HVAC (0.53)